This notebook will document my efforts to exploratory analyze covid data from India. There are two data sets, The first dataset contains information about the recorded covid cases in India on a day to day basis.It contains 18110 observations with 9 variables on each observation, including date, State/UnionTerritory, ConfirmedIndianNational, ConfirmedIndianNational, and many others. The second dataset contains information about the covid vaccinations in India. It contains 7845 observations with 24 variables on each observation, including date, State/UnionTerritory, ConfirmedIndianNational, ConfirmedIndianNational, and many others.
#import all packages needed for the analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
from datetime import datetime
#load in your dataset with pandas
covid_df = pd.read_csv('covid_19_india.csv')
#get a first five rows of the data
covid_df.head()
| Sno | Date | Time | State/UnionTerritory | ConfirmedIndianNational | ConfirmedForeignNational | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2020-01-30 | 6:00 PM | Kerala | 1 | 0 | 0 | 0 | 1 |
| 1 | 2 | 2020-01-31 | 6:00 PM | Kerala | 1 | 0 | 0 | 0 | 1 |
| 2 | 3 | 2020-02-01 | 6:00 PM | Kerala | 2 | 0 | 0 | 0 | 2 |
| 3 | 4 | 2020-02-02 | 6:00 PM | Kerala | 3 | 0 | 0 | 0 | 3 |
| 4 | 5 | 2020-02-03 | 6:00 PM | Kerala | 3 | 0 | 0 | 0 | 3 |
#get the summary of the data
covid_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18110 entries, 0 to 18109 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Sno 18110 non-null int64 1 Date 18110 non-null object 2 Time 18110 non-null object 3 State/UnionTerritory 18110 non-null object 4 ConfirmedIndianNational 18110 non-null object 5 ConfirmedForeignNational 18110 non-null object 6 Cured 18110 non-null int64 7 Deaths 18110 non-null int64 8 Confirmed 18110 non-null int64 dtypes: int64(4), object(5) memory usage: 1.2+ MB
#get the descriptive statistics of numeric variables
covid_df.describe()
| Sno | Cured | Deaths | Confirmed | |
|---|---|---|---|---|
| count | 18110.000000 | 1.811000e+04 | 18110.000000 | 1.811000e+04 |
| mean | 9055.500000 | 2.786375e+05 | 4052.402264 | 3.010314e+05 |
| std | 5228.051023 | 6.148909e+05 | 10919.076411 | 6.561489e+05 |
| min | 1.000000 | 0.000000e+00 | 0.000000 | 0.000000e+00 |
| 25% | 4528.250000 | 3.360250e+03 | 32.000000 | 4.376750e+03 |
| 50% | 9055.500000 | 3.336400e+04 | 588.000000 | 3.977350e+04 |
| 75% | 13582.750000 | 2.788698e+05 | 3643.750000 | 3.001498e+05 |
| max | 18110.000000 | 6.159676e+06 | 134201.000000 | 6.363442e+06 |
#load in the vaccine dataset
vaccine_df = pd.read_csv('covid_vaccine_statewise.csv')
#get the first five rows of the dataset
vaccine_df.head()
| Updated On | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | ... | 18-44 Years (Doses Administered) | 45-60 Years (Doses Administered) | 60+ Years (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16/01/2021 | India | 48276.0 | 3455.0 | 2957.0 | 48276.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 23757.0 | 24517.0 | 2.0 | 48276.0 |
| 1 | 17/01/2021 | India | 58604.0 | 8532.0 | 4954.0 | 58604.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 27348.0 | 31252.0 | 4.0 | 58604.0 |
| 2 | 18/01/2021 | India | 99449.0 | 13611.0 | 6583.0 | 99449.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 41361.0 | 58083.0 | 5.0 | 99449.0 |
| 3 | 19/01/2021 | India | 195525.0 | 17855.0 | 7951.0 | 195525.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 81901.0 | 113613.0 | 11.0 | 195525.0 |
| 4 | 20/01/2021 | India | 251280.0 | 25472.0 | 10504.0 | 251280.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 98111.0 | 153145.0 | 24.0 | 251280.0 |
5 rows × 24 columns
#get the summary of the data
vaccine_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7845 entries, 0 to 7844 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Updated On 7845 non-null object 1 State 7845 non-null object 2 Total Doses Administered 7621 non-null float64 3 Sessions 7621 non-null float64 4 Sites 7621 non-null float64 5 First Dose Administered 7621 non-null float64 6 Second Dose Administered 7621 non-null float64 7 Male (Doses Administered) 7461 non-null float64 8 Female (Doses Administered) 7461 non-null float64 9 Transgender (Doses Administered) 7461 non-null float64 10 Covaxin (Doses Administered) 7621 non-null float64 11 CoviShield (Doses Administered) 7621 non-null float64 12 Sputnik V (Doses Administered) 2995 non-null float64 13 AEFI 5438 non-null float64 14 18-44 Years (Doses Administered) 1702 non-null float64 15 45-60 Years (Doses Administered) 1702 non-null float64 16 60+ Years (Doses Administered) 1702 non-null float64 17 18-44 Years(Individuals Vaccinated) 3733 non-null float64 18 45-60 Years(Individuals Vaccinated) 3734 non-null float64 19 60+ Years(Individuals Vaccinated) 3734 non-null float64 20 Male(Individuals Vaccinated) 160 non-null float64 21 Female(Individuals Vaccinated) 160 non-null float64 22 Transgender(Individuals Vaccinated) 160 non-null float64 23 Total Individuals Vaccinated 5919 non-null float64 dtypes: float64(22), object(2) memory usage: 1.4+ MB
#no null values exist for the dataset
covid_df.isnull().sum()
Sno 0 Date 0 Time 0 State/UnionTerritory 0 ConfirmedIndianNational 0 ConfirmedForeignNational 0 Cured 0 Deaths 0 Confirmed 0 dtype: int64
#drop columns that would not be used in the analysis
covid_df.drop(['Sno', 'Time', 'ConfirmedIndianNational','ConfirmedForeignNational'], inplace=True, axis=1)
#check that columns were dropped
covid_df.head()
| Date | State/UnionTerritory | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|
| 0 | 2020-01-30 | Kerala | 0 | 0 | 1 |
| 1 | 2020-01-31 | Kerala | 0 | 0 | 1 |
| 2 | 2020-02-01 | Kerala | 0 | 0 | 2 |
| 3 | 2020-02-02 | Kerala | 0 | 0 | 3 |
| 4 | 2020-02-03 | Kerala | 0 | 0 | 3 |
#change the date variable to date time format
covid_df['Date'] = pd.to_datetime(covid_df['Date'], format = '%Y-%m-%d')
#check that the format has been changed
covid_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18110 entries, 0 to 18109 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 18110 non-null datetime64[ns] 1 State/UnionTerritory 18110 non-null object 2 Cured 18110 non-null int64 3 Deaths 18110 non-null int64 4 Confirmed 18110 non-null int64 dtypes: datetime64[ns](1), int64(3), object(1) memory usage: 707.5+ KB
#create a new column to show the number of active Cases
covid_df['Active_Cases']= covid_df['Confirmed']-(covid_df['Cured']+covid_df['Deaths'])
#check that the column has been added
covid_df.head()
| Date | State/UnionTerritory | Cured | Deaths | Confirmed | Active_Cases | |
|---|---|---|---|---|---|---|
| 0 | 2020-01-30 | Kerala | 0 | 0 | 1 | 1 |
| 1 | 2020-01-31 | Kerala | 0 | 0 | 1 | 1 |
| 2 | 2020-02-01 | Kerala | 0 | 0 | 2 | 2 |
| 3 | 2020-02-02 | Kerala | 0 | 0 | 3 | 3 |
| 4 | 2020-02-03 | Kerala | 0 | 0 | 3 | 3 |
#replaced wrongly spelt states with their correct names
covid_df['State/UnionTerritory'].replace('Maharashtra***', 'Maharashtra', inplace=True)
covid_df['State/UnionTerritory'].replace('Karanataka', 'Karnataka', inplace=True)
#create a pivot table to show the number of confirmed, deaths and cured cases in each state
statewise = pd.pivot_table(covid_df, values = ['Confirmed', 'Deaths', 'Cured'], index='State/UnionTerritory', aggfunc = max)
#create a column to calculate the recovery rate
statewise['Recovery_Rate'] = statewise['Cured']*100/statewise['Confirmed']
#create a column to create the mortality rate
statewise['Mortality_Rate'] = statewise['Deaths']*100/statewise['Confirmed']
#sort the dataset by a descending order of confirmed cases
statewise = statewise.sort_values(by = 'Confirmed', ascending = False)
#check that all changes have been reflected in the pivot table
statewise.head()
| Confirmed | Cured | Deaths | Recovery_Rate | Mortality_Rate | |
|---|---|---|---|---|---|
| State/UnionTerritory | |||||
| Maharashtra | 6363442 | 6159676 | 134201 | 96.797865 | 2.108937 |
| Kerala | 3586693 | 3396184 | 18004 | 94.688450 | 0.501967 |
| Karnataka | 2921049 | 2861499 | 36848 | 97.961349 | 1.261465 |
| Tamil Nadu | 2579130 | 2524400 | 34367 | 97.877967 | 1.332504 |
| Andhra Pradesh | 1985182 | 1952736 | 13564 | 98.365591 | 0.683262 |
# change a column name to be more descriptiveb
vaccine_df.rename(columns = {'Updated On' : 'Vaccine_Date'}, inplace=True)
#check that the change has been effected
vaccine_df.head()
| Vaccine_Date | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | ... | 18-44 Years (Doses Administered) | 45-60 Years (Doses Administered) | 60+ Years (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16/01/2021 | India | 48276.0 | 3455.0 | 2957.0 | 48276.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 23757.0 | 24517.0 | 2.0 | 48276.0 |
| 1 | 17/01/2021 | India | 58604.0 | 8532.0 | 4954.0 | 58604.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 27348.0 | 31252.0 | 4.0 | 58604.0 |
| 2 | 18/01/2021 | India | 99449.0 | 13611.0 | 6583.0 | 99449.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 41361.0 | 58083.0 | 5.0 | 99449.0 |
| 3 | 19/01/2021 | India | 195525.0 | 17855.0 | 7951.0 | 195525.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 81901.0 | 113613.0 | 11.0 | 195525.0 |
| 4 | 20/01/2021 | India | 251280.0 | 25472.0 | 10504.0 | 251280.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 98111.0 | 153145.0 | 24.0 | 251280.0 |
5 rows × 24 columns
#drop column that won't be used for the analysis
vaccination = vaccine_df.drop(columns = ['Sputnik V (Doses Administered)', 'AEFI', '18-44 Years (Doses Administered)', '45-60 Years (Doses Administered)', '60+ Years (Doses Administered)'], axis=1)
I dropped the sputnik v and AEFI columns because only abot 0.015% of the vaccinated individuals used this vaccine. I dropped the dose administered columns because I am more concerned about the amount of individuals that were vaccinated not the amount of doses that they were each given
#check that the columns have been dropped
vaccination.head()
| Vaccine_Date | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | Covaxin (Doses Administered) | CoviShield (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16/01/2021 | India | 48276.0 | 3455.0 | 2957.0 | 48276.0 | 0.0 | NaN | NaN | NaN | 579.0 | 47697.0 | NaN | NaN | NaN | 23757.0 | 24517.0 | 2.0 | 48276.0 |
| 1 | 17/01/2021 | India | 58604.0 | 8532.0 | 4954.0 | 58604.0 | 0.0 | NaN | NaN | NaN | 635.0 | 57969.0 | NaN | NaN | NaN | 27348.0 | 31252.0 | 4.0 | 58604.0 |
| 2 | 18/01/2021 | India | 99449.0 | 13611.0 | 6583.0 | 99449.0 | 0.0 | NaN | NaN | NaN | 1299.0 | 98150.0 | NaN | NaN | NaN | 41361.0 | 58083.0 | 5.0 | 99449.0 |
| 3 | 19/01/2021 | India | 195525.0 | 17855.0 | 7951.0 | 195525.0 | 0.0 | NaN | NaN | NaN | 3017.0 | 192508.0 | NaN | NaN | NaN | 81901.0 | 113613.0 | 11.0 | 195525.0 |
| 4 | 20/01/2021 | India | 251280.0 | 25472.0 | 10504.0 | 251280.0 | 0.0 | NaN | NaN | NaN | 3946.0 | 247334.0 | NaN | NaN | NaN | 98111.0 | 153145.0 | 24.0 | 251280.0 |
#dropping rows where state is India
vaccination = vaccination[vaccination.State != 'India']
vaccination.head()
| Vaccine_Date | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | Covaxin (Doses Administered) | CoviShield (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 212 | 16/01/2021 | Andaman and Nicobar Islands | 23.0 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | NaN | NaN | NaN | NaN | NaN | NaN | 23.0 |
| 213 | 17/01/2021 | Andaman and Nicobar Islands | 23.0 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | NaN | NaN | NaN | NaN | NaN | NaN | 23.0 |
| 214 | 18/01/2021 | Andaman and Nicobar Islands | 42.0 | 9.0 | 2.0 | 42.0 | 0.0 | 29.0 | 13.0 | 0.0 | 0.0 | 42.0 | NaN | NaN | NaN | NaN | NaN | NaN | 42.0 |
| 215 | 19/01/2021 | Andaman and Nicobar Islands | 89.0 | 12.0 | 2.0 | 89.0 | 0.0 | 53.0 | 36.0 | 0.0 | 0.0 | 89.0 | NaN | NaN | NaN | NaN | NaN | NaN | 89.0 |
| 216 | 20/01/2021 | Andaman and Nicobar Islands | 124.0 | 16.0 | 3.0 | 124.0 | 0.0 | 67.0 | 57.0 | 0.0 | 0.0 | 124.0 | NaN | NaN | NaN | NaN | NaN | NaN | 124.0 |
#rename one of the variable names
vaccination.rename(columns ={'Total Individuals Vaccinated' : 'Total'}, inplace=True)
vaccination.head()
| Vaccine_Date | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | Covaxin (Doses Administered) | CoviShield (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 212 | 16/01/2021 | Andaman and Nicobar Islands | 23.0 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | NaN | NaN | NaN | NaN | NaN | NaN | 23.0 |
| 213 | 17/01/2021 | Andaman and Nicobar Islands | 23.0 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | NaN | NaN | NaN | NaN | NaN | NaN | 23.0 |
| 214 | 18/01/2021 | Andaman and Nicobar Islands | 42.0 | 9.0 | 2.0 | 42.0 | 0.0 | 29.0 | 13.0 | 0.0 | 0.0 | 42.0 | NaN | NaN | NaN | NaN | NaN | NaN | 42.0 |
| 215 | 19/01/2021 | Andaman and Nicobar Islands | 89.0 | 12.0 | 2.0 | 89.0 | 0.0 | 53.0 | 36.0 | 0.0 | 0.0 | 89.0 | NaN | NaN | NaN | NaN | NaN | NaN | 89.0 |
| 216 | 20/01/2021 | Andaman and Nicobar Islands | 124.0 | 16.0 | 3.0 | 124.0 | 0.0 | 67.0 | 57.0 | 0.0 | 0.0 | 124.0 | NaN | NaN | NaN | NaN | NaN | NaN | 124.0 |
#create a heatmap of the covid dataset
statewise.style.background_gradient(cmap = 'viridis_r')
| Confirmed | Cured | Deaths | Recovery_Rate | Mortality_Rate | |
|---|---|---|---|---|---|
| State/UnionTerritory | |||||
| Maharashtra | 6363442 | 6159676 | 134201 | 96.797865 | 2.108937 |
| Kerala | 3586693 | 3396184 | 18004 | 94.688450 | 0.501967 |
| Karnataka | 2921049 | 2861499 | 36848 | 97.961349 | 1.261465 |
| Tamil Nadu | 2579130 | 2524400 | 34367 | 97.877967 | 1.332504 |
| Andhra Pradesh | 1985182 | 1952736 | 13564 | 98.365591 | 0.683262 |
| Uttar Pradesh | 1708812 | 1685492 | 22775 | 98.635309 | 1.332797 |
| West Bengal | 1534999 | 1506532 | 18252 | 98.145471 | 1.189056 |
| Delhi | 1436852 | 1411280 | 25068 | 98.220276 | 1.744647 |
| Chhattisgarh | 1003356 | 988189 | 13544 | 98.488373 | 1.349870 |
| Odisha | 988997 | 972710 | 6565 | 98.353180 | 0.663804 |
| Rajasthan | 953851 | 944700 | 8954 | 99.040626 | 0.938721 |
| Gujarat | 825085 | 814802 | 10077 | 98.753704 | 1.221329 |
| Madhya Pradesh | 791980 | 781330 | 10514 | 98.655269 | 1.327559 |
| Madhya Pradesh*** | 791656 | 780735 | 10506 | 98.620487 | 1.327092 |
| Haryana | 770114 | 759790 | 9652 | 98.659419 | 1.253321 |
| Bihar | 725279 | 715352 | 9646 | 98.631285 | 1.329971 |
| Bihar**** | 715730 | 701234 | 9452 | 97.974655 | 1.320610 |
| Telangana | 650353 | 638410 | 3831 | 98.163613 | 0.589065 |
| Punjab | 599573 | 582791 | 16322 | 97.201008 | 2.722271 |
| Assam | 576149 | 559684 | 5420 | 97.142232 | 0.940729 |
| Telengana | 443360 | 362160 | 2312 | 81.685312 | 0.521472 |
| Jharkhand | 347440 | 342102 | 5130 | 98.463620 | 1.476514 |
| Uttarakhand | 342462 | 334650 | 7368 | 97.718871 | 2.151480 |
| Jammu and Kashmir | 322771 | 317081 | 4392 | 98.237140 | 1.360717 |
| Himachal Pradesh | 208616 | 202761 | 3537 | 97.193408 | 1.695460 |
| Himanchal Pradesh | 204516 | 200040 | 3507 | 97.811418 | 1.714780 |
| Goa | 172085 | 167978 | 3164 | 97.613389 | 1.838626 |
| Puducherry | 121766 | 119115 | 1800 | 97.822873 | 1.478245 |
| Manipur | 105424 | 96776 | 1664 | 91.796934 | 1.578388 |
| Tripura | 80660 | 77811 | 773 | 96.467890 | 0.958344 |
| Meghalaya | 69769 | 64157 | 1185 | 91.956313 | 1.698462 |
| Chandigarh | 61992 | 61150 | 811 | 98.641760 | 1.308233 |
| Arunachal Pradesh | 50605 | 47821 | 248 | 94.498567 | 0.490070 |
| Mizoram | 46320 | 33722 | 171 | 72.802245 | 0.369171 |
| Nagaland | 28811 | 26852 | 585 | 93.200514 | 2.030474 |
| Sikkim | 28018 | 25095 | 356 | 89.567421 | 1.270612 |
| Ladakh | 20411 | 20130 | 207 | 98.623291 | 1.014159 |
| Dadra and Nagar Haveli and Daman and Diu | 10654 | 10646 | 4 | 99.924911 | 0.037545 |
| Dadra and Nagar Haveli | 10377 | 10261 | 4 | 98.882143 | 0.038547 |
| Lakshadweep | 10263 | 10165 | 51 | 99.045114 | 0.496931 |
| Cases being reassigned to states | 9265 | 0 | 0 | 0.000000 | 0.000000 |
| Andaman and Nicobar Islands | 7548 | 7412 | 129 | 98.198198 | 1.709062 |
| Unassigned | 77 | 0 | 0 | 0.000000 | 0.000000 |
| Daman & Diu | 2 | 0 | 0 | 0.000000 | 0.000000 |
#group the dataset by state and sort it by a descending number of active cases
top_10_active_states= covid_df.groupby(by ='State/UnionTerritory').max()[['Active_Cases', 'Date']].sort_values(by = ['Active_Cases'],ascending=False).reset_index()
#make a plot to show the top 10 staes with most active cases
fig = plt.figure(figsize=(16,9))
plt.title('Top 10 States with most active cases in India', fontsize=15)
ax = sns.barplot(data=top_10_active_states.iloc[:10], y='Active_Cases', x='State/UnionTerritory', linewidth=1, color =sns.color_palette()[0], edgecolor ='black')
plt.xlabel('States')
plt.ylabel('Total Active Cases')
Text(0, 0.5, 'Total Active Cases')
#group the dataset and sort it by the descending order of number of death cases
top_10_deaths = covid_df.groupby(by ='State/UnionTerritory').max()[['Deaths', 'Date']].sort_values(by = ['Deaths'],ascending=False).reset_index()
#make a plot to show the top ten states with most deaths in India
fig = plt.figure(figsize=(16,9))
plt.title('Top 10 States with most deaths in India', fontsize=15)
ax = sns.barplot(data=top_10_deaths.iloc[:10], y='Deaths', x='State/UnionTerritory', linewidth=1, color =sns.color_palette()[0], edgecolor = 'black')
plt.xlabel('States')
plt.ylabel('Total Deaths')
Text(0, 0.5, 'Total Deaths')
#make a plot to show the growth trend for the top 5 most affected states
fig = plt.figure(figsize=(12,6))
ax=sns.lineplot(data=covid_df[covid_df['State/UnionTerritory'].isin(['Maharashtra','Karnataka', 'Tamil Nadu', 'Delhi', 'Uttar Pradesh'])], x='Date', y='Active_Cases', hue = 'State/UnionTerritory')
ax.set_title('Top 5 Affected State in India', size=15)
Text(0.5, 1.0, 'Top 5 Affected State in India')
#make a pie chart to show the percentage of each gender that has been vaccinated
male = vaccine_df['Male(Individuals Vaccinated)'].sum()
female = vaccine_df['Female(Individuals Vaccinated)'].sum()
px.pie(names=['Male', 'Female'], values=[male,female], title='Male and Female Vaccination')
I did not represent the percentage of trangenders in the pie chart because only about 0.01% of the data set were trangender and it is negligible.
#make a pie chart to show the percentage of data that falls into each vaccine category
covaxin = vaccination[' Covaxin (Doses Administered)'].sum()
covisheild = vaccination['CoviShield (Doses Administered)'].sum()
px.pie(names=['Covaxin', 'CoviSheild'], values=[covaxin,covisheild], title= 'Percentage of Indiviaduals that used Covaxin and CoviSheild' )
#make a pie chart to show the percentage of each gender that has been vaccinated
Age18_to_44 = vaccination['18-44 Years(Individuals Vaccinated)'].sum()
Age45_to_60= vaccination['45-60 Years(Individuals Vaccinated)'].sum()
Above_60= vaccination['60+ Years(Individuals Vaccinated)'].sum()
px.pie(names=['18-44', '45-60', '60+'], values=[Age18_to_44, Age45_to_60, Above_60], title='Distribution of Covid Vaccines among Age Groups', hole = .5)
#group the vaccine dataset by state and sort in descending order the total number of vaccinations for each state
max_vac = vaccination.groupby('State')['Total'].sum().to_frame('Total')
max_vac = max_vac.sort_values('Total', ascending = False)[:5]
max_vac
| Total | |
|---|---|
| State | |
| Maharashtra | 1.403075e+09 |
| Uttar Pradesh | 1.200575e+09 |
| Rajasthan | 1.141163e+09 |
| Gujarat | 1.078261e+09 |
| West Bengal | 9.250227e+08 |
#make a plot to show the top five vaccinated states
fig = plt.figure(figsize=(10,5))
plt.title('Top 5 Vaccinated States in India', size = 15)
x = sns.barplot(data=max_vac, y = max_vac['Total'], x=max_vac.index, color=sns.color_palette()[0], edgecolor = 'black')
#group the vaccine dataset by state and sort in descending order the total number of vaccinations for each state
min_vac = vaccination.groupby('State')['Total'].sum().to_frame('Total')
min_vac = min_vac.sort_values('Total', ascending = True)[:5]
min_vac
| Total | |
|---|---|
| State | |
| Lakshadweep | 2124715.0 |
| Andaman and Nicobar Islands | 8102125.0 |
| Ladakh | 9466289.0 |
| Dadra and Nagar Haveli and Daman and Diu | 11358600.0 |
| Sikkim | 16136752.0 |
#make the plot to show the least five vaccinated states in the dataset
fig = plt.figure(figsize=(13,8))
plt.title('Least 5 Vaccinated States in India', size = 15)
x = sns.barplot(data=min_vac, y = min_vac['Total'], x=min_vac.index, color=sns.color_palette()[0], edgecolor = 'black')
statewise.to_csv('C:/Users/hp/Desktop/statewise.csv')